Playing repeated Stackelberg games with unknown opponents

نویسندگان

  • Janusz Marecki
  • Gerald Tesauro
  • Richard Segal
چکیده

In Stackelberg games, a “leader” player first chooses a mixed strategy to commit to, then a “follower” player responds based on the observed leader strategy. Notable strides have been made in scaling up the algorithms for such games, but the problem of finding optimal leader strategies spanning multiple rounds of the game, with a Bayesian prior over unknown follower preferences, has been left unaddressed. Towards remedying this shortcoming we propose a first-of-akind tractable method to compute an optimal plan of leader actions in a repeated game against an unknown follower, assuming that the follower plays myopic best-response in every round. Our approach combines Monte Carlo Tree Search, dealing with leader exploration/exploitation tradeoffs, with a novel technique for the identification and pruning of dominated leader strategies. The method provably finds asymptotically optimal solutions and scales up to real world security games spanning double-digit number of rounds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning against sequential opponents in repeated stochastic games

This article considers multiagent algorithms that aim to find the best response in strategic interactions by learning about the game and their opponents from observations. In contrast to many state-of-the-art algorithms that assume repeated interaction with a fixed set of opponents (or even self-play), a learner in the real world is more likely to encounter the same strategic situation with cha...

متن کامل

Learning to Play Stackelberg Security Games

As discussed in previous chapters, algorithmic research on Stackelberg Security Games has had a striking real-world impact. But an algorithm that computes an optimal strategy for the defender can only be as good as the game it receives as input, and if that game is an inaccurate model of reality then the output of the algorithm will likewise be flawed. Consequently, researchers have introduced ...

متن کامل

Learning in and about Games

We study learning in finitely repeated 2× 2 normal form games, when players have incomplete information about their opponents’ payoffs. In a laboratory experiment we investigate whether players (a) learn the game they are playing, (b) learn to predict the behavior of their opponent, and (c) learn to play according to a Nash equilibrium of the repeated game. Our results show that the success in ...

متن کامل

Towards a Fast Detection of Opponents in Repeated Stochastic Games

Multi-agent algorithms aim to find the best response in strategic interactions. While many state-of-the-art algorithms assume repeated interaction with a fixed set of opponents (or even self-play), a learner in the real world is more likely to encounter the same strategic situation with changing counter-parties. This article presents a formal model of such sequential interactions, and a corresp...

متن کامل

Reputation in Perturbed Repeated Games

The paper analyzes reputation effects in general perturbed repeated games with discounting. If there is some positive prior probability that one of the players is committed to play the same (pure or mixed) action in every period, then this provides a lower bound for her equilibrium payoff in all Nash equilibria. This bound is tight and independent of what other types have positive probability. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012